Overview
Brought to you by YData
Dataset statistics
| Number of variables | 13 |
|---|---|
| Number of observations | 899164 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 13439 |
| Duplicate rows (%) | 1.5% |
| Total size in memory | 89.2 MiB |
| Average record size in memory | 104.0 B |
Variable types
| Numeric | 8 |
|---|---|
| Categorical | 5 |
| Dataset has 13439 (1.5%) duplicate rows | Duplicates |
ApprovalFY is highly overall correlated with RetainedJob | High correlation |
GrAppv is highly overall correlated with Term | High correlation |
RetainedJob is highly overall correlated with ApprovalFY | High correlation |
Term is highly overall correlated with GrAppv | High correlation |
FranchiseCode is highly imbalanced (68.2%) | Imbalance |
NoEmp is highly skewed (γ1 = 80.24824355) | Skewed |
CreateJob is highly skewed (γ1 = 36.99135473) | Skewed |
RetainedJob is highly skewed (γ1 = 36.85481184) | Skewed |
NAICS has 201948 (22.5%) zeros | Zeros |
CreateJob has 629248 (70.0%) zeros | Zeros |
RetainedJob has 440403 (49.0%) zeros | Zeros |
Reproduction
| Analysis started | 2025-02-14 00:19:58.837458 |
|---|---|
| Analysis finished | 2025-02-14 00:20:17.379147 |
| Duration | 18.54 seconds |
| Software version | ydata-profiling vv4.12.2 |
| Download configuration | config.json |
Variables
State
Real number (ℝ)
| Distinct | 52 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1199.5479 |
| Minimum | 111 |
|---|---|
| Maximum | 4633 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.9 MiB |
Quantile statistics
| Minimum | 111 |
|---|---|
| 5-th percentile | 301 |
| Q1 | 612 |
| median | 1314 |
| Q3 | 1518 |
| 95-th percentile | 2301 |
| Maximum | 4633 |
| Range | 4522 |
| Interquartile range (IQR) | 906 |
Descriptive statistics
| Standard deviation | 648.31203 |
|---|---|
| Coefficient of variation (CV) | 0.54046362 |
| Kurtosis | -1.0275286 |
| Mean | 1199.5479 |
| Median Absolute Deviation (MAD) | 495 |
| Skewness | -0.07158061 |
| Sum | 1.0785903 × 109 |
| Variance | 420308.48 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 301 | 130619 | 14.5% |
| 2024 | 70458 | 7.8% |
| 1425 | 57693 | 6.4% |
| 612 | 41212 | 4.6% |
| 1601 | 35170 | 3.9% |
| 1508 | 32622 | 3.6% |
| 912 | 29669 | 3.3% |
| 1301 | 25272 | 2.8% |
| 1314 | 24373 | 2.7% |
| 1410 | 24035 | 2.7% |
| Other values (42) | 428041 |
| Value | Count | Frequency (%) |
| 111 | 2405 | 0.3% |
| 112 | 8362 | 0.9% |
| 118 | 6341 | 0.7% |
| 126 | 17631 | 2.0% |
| 301 | 130619 | |
| 315 | 20605 | 2.3% |
| 320 | 12229 | 1.4% |
| 403 | 1613 | 0.2% |
| 405 | 2220 | 0.2% |
| 612 | 41212 | 4.6% |
| Value | Count | Frequency (%) |
| 4633 | 14 | < 0.1% |
| 2325 | 2839 | 0.3% |
| 2322 | 3287 | 0.4% |
| 2309 | 21040 | 2.3% |
| 2301 | 23263 | 2.6% |
| 2220 | 5454 | 0.6% |
| 2201 | 13264 | 1.5% |
| 2120 | 18776 | 2.1% |
| 2024 | 70458 | |
| 2014 | 9403 | 1.0% |
NAICS
Real number (ℝ)
Zeros 
| Distinct | 25 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 39.612263 |
| Minimum | 0 |
|---|---|
| Maximum | 92 |
| Zeros | 201948 |
| Zeros (%) | 22.5% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.9 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 23 |
| median | 44 |
| Q3 | 56 |
| 95-th percentile | 81 |
| Maximum | 92 |
| Range | 92 |
| Interquartile range (IQR) | 33 |
Descriptive statistics
| Standard deviation | 26.284706 |
|---|---|
| Coefficient of variation (CV) | 0.66354972 |
| Kurtosis | -1.0572678 |
| Mean | 39.612263 |
| Median Absolute Deviation (MAD) | 18 |
| Skewness | -0.24819754 |
| Sum | 35617921 |
| Variance | 690.88577 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 201948 | |
| 44 | 84737 | |
| 81 | 72618 | 8.1% |
| 54 | 68170 | 7.6% |
| 72 | 67600 | 7.5% |
| 23 | 66646 | 7.4% |
| 62 | 55366 | 6.2% |
| 42 | 48743 | 5.4% |
| 45 | 42514 | 4.7% |
| 33 | 38284 | 4.3% |
| Other values (15) | 152538 |
| Value | Count | Frequency (%) |
| 0 | 201948 | |
| 11 | 9005 | 1.0% |
| 21 | 1851 | 0.2% |
| 22 | 663 | 0.1% |
| 23 | 66646 | 7.4% |
| 31 | 11809 | 1.3% |
| 32 | 17936 | 2.0% |
| 33 | 38284 | 4.3% |
| 42 | 48743 | 5.4% |
| 44 | 84737 |
| Value | Count | Frequency (%) |
| 92 | 229 | < 0.1% |
| 81 | 72618 | |
| 72 | 67600 | |
| 71 | 14640 | 1.6% |
| 62 | 55366 | |
| 61 | 6425 | 0.7% |
| 56 | 32685 | |
| 55 | 257 | < 0.1% |
| 54 | 68170 | |
| 53 | 13632 | 1.5% |
ApprovalFY
Real number (ℝ)
High correlation 
| Distinct | 51 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2001.1436 |
| Minimum | 1962 |
|---|---|
| Maximum | 2014 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.9 MiB |
Quantile statistics
| Minimum | 1962 |
|---|---|
| 5-th percentile | 1991 |
| Q1 | 1997 |
| median | 2002 |
| Q3 | 2006 |
| 95-th percentile | 2009 |
| Maximum | 2014 |
| Range | 52 |
| Interquartile range (IQR) | 9 |
Descriptive statistics
| Standard deviation | 5.9138459 |
|---|---|
| Coefficient of variation (CV) | 0.0029552332 |
| Kurtosis | -0.092531047 |
| Mean | 2001.1436 |
| Median Absolute Deviation (MAD) | 4 |
| Skewness | -0.58537855 |
| Sum | 1.7993562 × 109 |
| Variance | 34.973573 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 2005 | 77525 | 8.6% |
| 2006 | 76040 | 8.5% |
| 2007 | 71876 | 8.0% |
| 2004 | 68290 | 7.6% |
| 2003 | 58193 | 6.5% |
| 1995 | 45758 | 5.1% |
| 2002 | 44391 | 4.9% |
| 1996 | 40112 | 4.5% |
| 2008 | 39540 | 4.4% |
| 1997 | 37748 | 4.2% |
| Other values (41) | 339691 |
| Value | Count | Frequency (%) |
| 1962 | 1 | < 0.1% |
| 1965 | 1 | < 0.1% |
| 1966 | 1 | < 0.1% |
| 1967 | 2 | < 0.1% |
| 1968 | 2 | < 0.1% |
| 1969 | 4 | < 0.1% |
| 1970 | 8 | < 0.1% |
| 1971 | 20 | < 0.1% |
| 1972 | 27 | |
| 1973 | 52 |
| Value | Count | Frequency (%) |
| 2014 | 268 | < 0.1% |
| 2013 | 2458 | 0.3% |
| 2012 | 5997 | 0.7% |
| 2011 | 12608 | 1.4% |
| 2010 | 16848 | 1.9% |
| 2009 | 19126 | 2.1% |
| 2008 | 39540 | |
| 2007 | 71876 | |
| 2006 | 76040 | |
| 2005 | 77525 |
Term
Real number (ℝ)
High correlation 
| Distinct | 412 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 110.77308 |
| Minimum | 0 |
|---|---|
| Maximum | 569 |
| Zeros | 810 |
| Zeros (%) | 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.9 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 16 |
| Q1 | 60 |
| median | 84 |
| Q3 | 120 |
| 95-th percentile | 300 |
| Maximum | 569 |
| Range | 569 |
| Interquartile range (IQR) | 60 |
Descriptive statistics
| Standard deviation | 78.857305 |
|---|---|
| Coefficient of variation (CV) | 0.7118815 |
| Kurtosis | 0.18570424 |
| Mean | 110.77308 |
| Median Absolute Deviation (MAD) | 33 |
| Skewness | 1.1209258 |
| Sum | 99603164 |
| Variance | 6218.4746 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 84 | 230162 | |
| 60 | 89945 | 10.0% |
| 240 | 85982 | 9.6% |
| 120 | 77654 | 8.6% |
| 300 | 44727 | 5.0% |
| 180 | 28164 | 3.1% |
| 36 | 19800 | 2.2% |
| 12 | 17095 | 1.9% |
| 48 | 15621 | 1.7% |
| 72 | 9419 | 1.0% |
| Other values (402) | 280595 |
| Value | Count | Frequency (%) |
| 0 | 810 | 0.1% |
| 1 | 1608 | |
| 2 | 1809 | |
| 3 | 2112 | |
| 4 | 2173 | |
| 5 | 1866 | |
| 6 | 3054 | |
| 7 | 1761 | |
| 8 | 1693 | |
| 9 | 1875 |
| Value | Count | Frequency (%) |
| 569 | 1 | |
| 527 | 1 | |
| 511 | 1 | |
| 505 | 1 | |
| 481 | 1 | |
| 480 | 1 | |
| 461 | 1 | |
| 449 | 1 | |
| 445 | 1 | |
| 443 | 1 |
NoEmp
Real number (ℝ)
Skewed 
| Distinct | 599 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 11.411353 |
| Minimum | 0 |
|---|---|
| Maximum | 9999 |
| Zeros | 6631 |
| Zeros (%) | 0.7% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.9 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 2 |
| median | 4 |
| Q3 | 10 |
| 95-th percentile | 40 |
| Maximum | 9999 |
| Range | 9999 |
| Interquartile range (IQR) | 8 |
Descriptive statistics
| Standard deviation | 74.108196 |
|---|---|
| Coefficient of variation (CV) | 6.4942514 |
| Kurtosis | 7965.2886 |
| Mean | 11.411353 |
| Median Absolute Deviation (MAD) | 3 |
| Skewness | 80.248244 |
| Sum | 10260678 |
| Variance | 5492.0248 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 154254 | |
| 2 | 138297 | |
| 3 | 90674 | |
| 4 | 73644 | 8.2% |
| 5 | 60319 | 6.7% |
| 6 | 45759 | 5.1% |
| 10 | 31536 | 3.5% |
| 7 | 31495 | 3.5% |
| 8 | 31361 | 3.5% |
| 12 | 20822 | 2.3% |
| Other values (589) | 221003 |
| Value | Count | Frequency (%) |
| 0 | 6631 | 0.7% |
| 1 | 154254 | |
| 2 | 138297 | |
| 3 | 90674 | |
| 4 | 73644 | |
| 5 | 60319 | 6.7% |
| 6 | 45759 | 5.1% |
| 7 | 31495 | 3.5% |
| 8 | 31361 | 3.5% |
| 9 | 18131 | 2.0% |
| Value | Count | Frequency (%) |
| 9999 | 4 | |
| 9992 | 1 | < 0.1% |
| 9945 | 1 | < 0.1% |
| 9090 | 1 | < 0.1% |
| 9000 | 2 | < 0.1% |
| 8500 | 1 | < 0.1% |
| 8041 | 1 | < 0.1% |
| 8018 | 1 | < 0.1% |
| 8000 | 7 | |
| 7999 | 1 | < 0.1% |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 2 |
|---|---|
| 2nd row | 2 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 1 | 644869 | |
| 2 | 253125 | 28.2% |
| 0 | 1170 | 0.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 1 | 644869 | |
| 2 | 253125 | 28.2% |
| 0 | 1170 | 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 644869 | |
| 2 | 253125 | 28.2% |
| 0 | 1170 | 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 899164 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 1 | 644869 | |
| 2 | 253125 | 28.2% |
| 0 | 1170 | 0.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 899164 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 1 | 644869 | |
| 2 | 253125 | 28.2% |
| 0 | 1170 | 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 899164 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 1 | 644869 | |
| 2 | 253125 | 28.2% |
| 0 | 1170 | 0.1% |
CreateJob
Real number (ℝ)
Skewed  Zeros 
| Distinct | 246 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 8.4303764 |
| Minimum | 0 |
|---|---|
| Maximum | 8800 |
| Zeros | 629248 |
| Zeros (%) | 70.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.9 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 1 |
| 95-th percentile | 10 |
| Maximum | 8800 |
| Range | 8800 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 236.68817 |
|---|---|
| Coefficient of variation (CV) | 28.075634 |
| Kurtosis | 1369.911 |
| Mean | 8.4303764 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 36.991355 |
| Sum | 7580291 |
| Variance | 56021.288 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 629248 | |
| 1 | 63174 | 7.0% |
| 2 | 57831 | 6.4% |
| 3 | 28806 | 3.2% |
| 4 | 20511 | 2.3% |
| 5 | 18691 | 2.1% |
| 10 | 11602 | 1.3% |
| 6 | 11009 | 1.2% |
| 8 | 7378 | 0.8% |
| 7 | 6374 | 0.7% |
| Other values (236) | 44540 | 5.0% |
| Value | Count | Frequency (%) |
| 0 | 629248 | |
| 1 | 63174 | 7.0% |
| 2 | 57831 | 6.4% |
| 3 | 28806 | 3.2% |
| 4 | 20511 | 2.3% |
| 5 | 18691 | 2.1% |
| 6 | 11009 | 1.2% |
| 7 | 6374 | 0.7% |
| 8 | 7378 | 0.8% |
| 9 | 3330 | 0.4% |
| Value | Count | Frequency (%) |
| 8800 | 648 | |
| 5621 | 1 | < 0.1% |
| 5199 | 1 | < 0.1% |
| 5085 | 1 | < 0.1% |
| 3500 | 1 | < 0.1% |
| 3100 | 1 | < 0.1% |
| 3000 | 4 | < 0.1% |
| 2515 | 1 | < 0.1% |
| 2140 | 1 | < 0.1% |
| 2020 | 1 | < 0.1% |
RetainedJob
Real number (ℝ)
High correlation  Skewed  Zeros 
| Distinct | 358 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 10.797257 |
| Minimum | 0 |
|---|---|
| Maximum | 9500 |
| Zeros | 440403 |
| Zeros (%) | 49.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.9 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 1 |
| Q3 | 4 |
| 95-th percentile | 20 |
| Maximum | 9500 |
| Range | 9500 |
| Interquartile range (IQR) | 4 |
Descriptive statistics
| Standard deviation | 237.1206 |
|---|---|
| Coefficient of variation (CV) | 21.961188 |
| Kurtosis | 1362.0182 |
| Mean | 10.797257 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 36.854812 |
| Sum | 9708505 |
| Variance | 56226.179 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 440403 | |
| 1 | 88790 | 9.9% |
| 2 | 76851 | 8.5% |
| 3 | 49963 | 5.6% |
| 4 | 39666 | 4.4% |
| 5 | 32627 | 3.6% |
| 6 | 23796 | 2.6% |
| 7 | 16530 | 1.8% |
| 8 | 15698 | 1.7% |
| 10 | 15438 | 1.7% |
| Other values (348) | 99402 | 11.1% |
| Value | Count | Frequency (%) |
| 0 | 440403 | |
| 1 | 88790 | 9.9% |
| 2 | 76851 | 8.5% |
| 3 | 49963 | 5.6% |
| 4 | 39666 | 4.4% |
| 5 | 32627 | 3.6% |
| 6 | 23796 | 2.6% |
| 7 | 16530 | 1.8% |
| 8 | 15698 | 1.7% |
| 9 | 8735 | 1.0% |
| Value | Count | Frequency (%) |
| 9500 | 1 | < 0.1% |
| 8800 | 648 | |
| 7250 | 1 | < 0.1% |
| 5000 | 1 | < 0.1% |
| 4441 | 1 | < 0.1% |
| 4000 | 2 | < 0.1% |
| 3900 | 1 | < 0.1% |
| 3860 | 1 | < 0.1% |
| 3225 | 1 | < 0.1% |
| 3200 | 1 | < 0.1% |
FranchiseCode
Categorical
Imbalance 
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 6.9 MiB |
| 0 | |
|---|---|
| 1 | 51775 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 847389 | |
| 1 | 51775 | 5.8% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0 | 847389 | |
| 1 | 51775 | 5.8% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 847389 | |
| 1 | 51775 | 5.8% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 899164 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 847389 | |
| 1 | 51775 | 5.8% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 899164 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 847389 | |
| 1 | 51775 | 5.8% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 899164 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 847389 | |
| 1 | 51775 | 5.8% |
RevLineCr
Categorical
| Distinct | 4 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 6.9 MiB |
| 1 | |
|---|---|
| 0 | |
| 3 | |
| 2 | 15284 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 1 | 420288 | |
| 0 | 262195 | |
| 3 | 201397 | |
| 2 | 15284 | 1.7% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 1 | 420288 | |
| 0 | 262195 | |
| 3 | 201397 | |
| 2 | 15284 | 1.7% |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 420288 | |
| 0 | 262195 | |
| 3 | 201397 | |
| 2 | 15284 | 1.7% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 899164 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 1 | 420288 | |
| 0 | 262195 | |
| 3 | 201397 | |
| 2 | 15284 | 1.7% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 899164 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 1 | 420288 | |
| 0 | 262195 | |
| 3 | 201397 | |
| 2 | 15284 | 1.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 899164 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 1 | 420288 | |
| 0 | 262195 | |
| 3 | 201397 | |
| 2 | 15284 | 1.7% |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 1 |
| 3rd row | 0 |
| 4th row | 1 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 788829 | |
| 1 | 110335 | 12.3% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0 | 788829 | |
| 1 | 110335 | 12.3% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 788829 | |
| 1 | 110335 | 12.3% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 899164 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 788829 | |
| 1 | 110335 | 12.3% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 899164 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 788829 | |
| 1 | 110335 | 12.3% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 899164 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 788829 | |
| 1 | 110335 | 12.3% |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 1 | 741345 | |
| 0 | 157819 | 17.6% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 1 | 741345 | |
| 0 | 157819 | 17.6% |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 741345 | |
| 0 | 157819 | 17.6% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 899164 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 1 | 741345 | |
| 0 | 157819 | 17.6% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 899164 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 1 | 741345 | |
| 0 | 157819 | 17.6% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 899164 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 1 | 741345 | |
| 0 | 157819 | 17.6% |
GrAppv
Real number (ℝ)
High correlation 
| Distinct | 22128 |
|---|---|
| Distinct (%) | 2.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 192686.98 |
| Minimum | 200 |
|---|---|
| Maximum | 5472000 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.9 MiB |
Quantile statistics
| Minimum | 200 |
|---|---|
| 5-th percentile | 10000 |
| Q1 | 35000 |
| median | 90000 |
| Q3 | 225000 |
| 95-th percentile | 750000 |
| Maximum | 5472000 |
| Range | 5471800 |
| Interquartile range (IQR) | 190000 |
Descriptive statistics
| Standard deviation | 283263.39 |
|---|---|
| Coefficient of variation (CV) | 1.4700702 |
| Kurtosis | 21.018882 |
| Mean | 192686.98 |
| Median Absolute Deviation (MAD) | 65000 |
| Skewness | 3.5207901 |
| Sum | 1.7325719 × 1011 |
| Variance | 8.0238149 × 1010 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 50000 | 69394 | 7.7% |
| 25000 | 51258 | 5.7% |
| 100000 | 50977 | 5.7% |
| 10000 | 38366 | 4.3% |
| 150000 | 27624 | 3.1% |
| 20000 | 23434 | 2.6% |
| 35000 | 23181 | 2.6% |
| 30000 | 21004 | 2.3% |
| 5000 | 19146 | 2.1% |
| 15000 | 18472 | 2.1% |
| Other values (22118) | 556308 |
| Value | Count | Frequency (%) |
| 200 | 2 | < 0.1% |
| 300 | 1 | < 0.1% |
| 400 | 2 | < 0.1% |
| 500 | 33 | < 0.1% |
| 700 | 4 | < 0.1% |
| 800 | 4 | < 0.1% |
| 950 | 1 | < 0.1% |
| 1000 | 444 | |
| 1200 | 12 | < 0.1% |
| 1300 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 5472000 | 1 | < 0.1% |
| 5000000 | 40 | |
| 4991700 | 1 | < 0.1% |
| 4950000 | 1 | < 0.1% |
| 4908500 | 1 | < 0.1% |
| 4900000 | 2 | < 0.1% |
| 4872000 | 1 | < 0.1% |
| 4869000 | 1 | < 0.1% |
| 4830000 | 1 | < 0.1% |
| 4800000 | 1 | < 0.1% |
Interactions
Correlations
| ApprovalFY | CreateJob | FranchiseCode | GrAppv | LowDoc | MIS_Status | NAICS | NewExist | NoEmp | RetainedJob | RevLineCr | State | Term | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ApprovalFY | 1.000 | 0.268 | 0.048 | -0.300 | 0.375 | 0.327 | 0.440 | 0.062 | -0.226 | 0.546 | 0.359 | -0.001 | -0.297 |
| CreateJob | 0.268 | 1.000 | 0.001 | 0.093 | 0.010 | 0.012 | 0.156 | 0.009 | 0.034 | 0.377 | 0.016 | -0.032 | 0.082 |
| FranchiseCode | 0.048 | 0.001 | 1.000 | 0.065 | 0.028 | 0.015 | 0.222 | 0.142 | 0.002 | 0.004 | 0.129 | 0.036 | 0.105 |
| GrAppv | -0.300 | 0.093 | 0.065 | 1.000 | 0.116 | 0.074 | -0.142 | 0.037 | 0.455 | -0.138 | 0.099 | -0.067 | 0.558 |
| LowDoc | 0.375 | 0.010 | 0.028 | 0.116 | 1.000 | 0.084 | 0.154 | 0.161 | 0.003 | 0.010 | 0.226 | 0.087 | 0.169 |
| MIS_Status | 0.327 | 0.012 | 0.015 | 0.074 | 0.084 | 1.000 | 0.148 | 0.022 | 0.004 | 0.013 | 0.146 | 0.054 | 0.491 |
| NAICS | 0.440 | 0.156 | 0.222 | -0.142 | 0.154 | 0.148 | 1.000 | 0.093 | -0.151 | 0.268 | 0.214 | -0.000 | -0.076 |
| NewExist | 0.062 | 0.009 | 0.142 | 0.037 | 0.161 | 0.022 | 0.093 | 1.000 | 0.004 | 0.002 | 0.065 | 0.070 | 0.088 |
| NoEmp | -0.226 | 0.034 | 0.002 | 0.455 | 0.003 | 0.004 | -0.151 | 0.004 | 1.000 | 0.124 | 0.005 | -0.040 | 0.200 |
| RetainedJob | 0.546 | 0.377 | 0.004 | -0.138 | 0.010 | 0.013 | 0.268 | 0.002 | 0.124 | 1.000 | 0.016 | -0.030 | -0.157 |
| RevLineCr | 0.359 | 0.016 | 0.129 | 0.099 | 0.226 | 0.146 | 0.214 | 0.065 | 0.005 | 0.016 | 1.000 | 0.046 | 0.242 |
| State | -0.001 | -0.032 | 0.036 | -0.067 | 0.087 | 0.054 | -0.000 | 0.070 | -0.040 | -0.030 | 0.046 | 1.000 | -0.088 |
| Term | -0.297 | 0.082 | 0.105 | 0.558 | 0.169 | 0.491 | -0.076 | 0.088 | 0.200 | -0.157 | 0.242 | -0.088 | 1.000 |
Missing values
Sample
| State | NAICS | ApprovalFY | Term | NoEmp | NewExist | CreateJob | RetainedJob | FranchiseCode | RevLineCr | LowDoc | MIS_Status | GrAppv | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 914 | 45 | 1997 | 84 | 4 | 2 | 0 | 0 | 0 | 1 | 1 | 1 | 60000 |
| 1 | 914 | 72 | 1997 | 60 | 2 | 2 | 0 | 0 | 0 | 1 | 1 | 1 | 40000 |
| 2 | 914 | 62 | 1997 | 180 | 7 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 287000 |
| 3 | 1511 | 0 | 1997 | 60 | 2 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 35000 |
| 4 | 612 | 0 | 1997 | 240 | 14 | 1 | 7 | 7 | 0 | 1 | 0 | 1 | 229000 |
| 5 | 320 | 33 | 1997 | 120 | 19 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 517000 |
| 6 | 1410 | 0 | 1980 | 45 | 45 | 2 | 0 | 0 | 0 | 1 | 0 | 0 | 600000 |
| 7 | 612 | 81 | 1997 | 84 | 1 | 2 | 0 | 0 | 0 | 1 | 1 | 1 | 45000 |
| 8 | 612 | 72 | 1997 | 297 | 2 | 2 | 0 | 0 | 0 | 1 | 0 | 1 | 305000 |
| 9 | 320 | 0 | 1997 | 84 | 3 | 2 | 0 | 0 | 0 | 1 | 1 | 1 | 70000 |
| State | NAICS | ApprovalFY | Term | NoEmp | NewExist | CreateJob | RetainedJob | FranchiseCode | RevLineCr | LowDoc | MIS_Status | GrAppv | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 899154 | 1508 | 0 | 1997 | 60 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 10000 |
| 899155 | 1425 | 62 | 1997 | 180 | 2 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 128000 |
| 899156 | 1304 | 33 | 1997 | 60 | 20 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 50000 |
| 899157 | 301 | 31 | 1997 | 36 | 40 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 200000 |
| 899158 | 2024 | 0 | 1997 | 84 | 5 | 2 | 0 | 0 | 0 | 1 | 1 | 1 | 79000 |
| 899159 | 1508 | 45 | 1997 | 60 | 6 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 70000 |
| 899160 | 1508 | 45 | 1997 | 60 | 6 | 1 | 0 | 0 | 0 | 3 | 0 | 1 | 85000 |
| 899161 | 301 | 33 | 1997 | 108 | 26 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 300000 |
| 899162 | 809 | 0 | 1997 | 60 | 6 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 75000 |
| 899163 | 809 | 0 | 1997 | 48 | 1 | 2 | 0 | 0 | 0 | 1 | 0 | 1 | 30000 |
Duplicate rows
Most frequently occurring
| State | NAICS | ApprovalFY | Term | NoEmp | NewExist | CreateJob | RetainedJob | FranchiseCode | RevLineCr | LowDoc | MIS_Status | GrAppv | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 10666 | 1601 | 54 | 2003 | 84 | 1 | 1 | 0 | 1 | 0 | 3 | 0 | 1 | 5000 | 32 |
| 847 | 301 | 0 | 1998 | 84 | 1 | 1 | 0 | 0 | 0 | 3 | 0 | 1 | 10000 | 31 |
| 4285 | 612 | 54 | 2003 | 84 | 1 | 1 | 0 | 1 | 0 | 3 | 0 | 1 | 10000 | 30 |
| 4316 | 612 | 54 | 2004 | 84 | 1 | 1 | 0 | 1 | 0 | 3 | 0 | 1 | 10000 | 30 |
| 9929 | 1508 | 54 | 2008 | 60 | 0 | 1 | 0 | 0 | 0 | 3 | 0 | 1 | 25000 | 24 |
| 10889 | 1601 | 81 | 2003 | 84 | 1 | 1 | 0 | 1 | 0 | 3 | 0 | 1 | 5000 | 23 |
| 10698 | 1601 | 54 | 2004 | 84 | 1 | 1 | 0 | 1 | 0 | 3 | 0 | 1 | 5000 | 22 |
| 2229 | 301 | 48 | 2008 | 12 | 145 | 1 | 124 | 145 | 0 | 3 | 0 | 1 | 1500 | 20 |
| 2494 | 301 | 54 | 2004 | 84 | 2 | 1 | 0 | 2 | 0 | 3 | 0 | 1 | 10000 | 20 |
| 4439 | 612 | 56 | 2004 | 84 | 1 | 1 | 0 | 1 | 0 | 3 | 0 | 1 | 10000 | 18 |